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Abstract 

In 1994, Burrows and Wheeler [5] developed a data compression algorithm which 
performs significantly better than Lempel-Ziv based algorithms. Since then, a lot 
of work has been done in order to improve their algorithm, which is based on a 
reversible transformation of the input string, called BWT (the Burrows- Wheeler 
transformation). In this paper, we propose a compression scheme based on BWT, 
MTF (move-to-front coding), and a version of the algorithms presented in [13]. 
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in 
o 

O . A very promising development in the field of lossless data compression is the 



algorithm by Burrows and Wheeler [5]. Since its publication in 1994, their 
algorithm has been widely studied, improved, and implemented on different 



platforms. Their original algorithm, as reported in [5], achieves speed compa- 
rable to Lempel-Ziv based algorithms and compression performance close to 
the best PPM techniques [2]. 

The most interesting and unusual step in their compression scheme is a re- 
versible transformation of the input string (the Burrows- Wheeler transforma- 
tion, or BWT), which reorders the symbols such that the newly created string 
contains the same symbols, but is easier to compress with simple locally adap- 
tive algorithms such as move-to-front coding (MTF) [3]. 

In this paper, we propose a compression scheme based on BWT, MTF, adap- 
tive codes [13,14], and a version of the algorithms presented in [13]. More 
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specifically, the following sections are aimed to present a detailed description 
of our algorithm in a progressive manner, including reports of experimental 
results. As we shall see, experiments performed on various well-known proteins 
prove that on this type of information our algorithm significantly outperforms 
the bzip2 utility [11], which is a well-known implementation of the algorithm 
introduced by Burrows and Wheeler. 



2 Adaptive codes 

Adaptive codes have been recently presented in [13,14] as a new class of non- 
standard variable-length codes. The aim of this section is to briefly review 
some basic definitions and notations. For more details, the reader is referred 
to [13,14]. 

We denote by | S'l the cardinality of the set S; if x is a string of finite length, 
then |x| denotes the length of x. The empty string is denoted by A. For an 
alphabet A, we denote by A n the set {sis 2 ■ ■ ■ s n | Sj G A for all i}, by A* the 
set U^°=o A "> and by A+ the set U^°=i A n , where A denotes the set {A}. Also, 
we denote by A^ 1 the set ULo A *> and b Y A "" the set U^„ A*. 

Let X be a finite and nonempty subset of A + , and w G A + . A decomposition 
of w over X is any sequence of strings U\, u 2 , ■ ■ ■ , with Ui G X, 1 < i < h, 
such that w = U\U 2 . . -Uh- A code over A is any nonempty set C C A + such 
that each string w G A + has at most one decomposition over C. A prefix 
code over A is any code C over A such that no string in C is proper prefix of 
another string in C. If u, v are two strings, then we denote by u • v, or simply 
by uv the catenation of u with v. 

Definition 1 Let S and A be alphabets. A function c : S x S- n — > A + 7 

n>\, is called adaptive code of order n if its unique homomorphic extension 
c : S* — > A* defined by: 

• c(A) = A 

• c(<7i<7 2 • • • 0- m ) = c(a 1 , A) c(<7 2 , <7i) ... c(cr„_i, (71(72 . . . CT n _ 2 ) 
C((7 n , (7i(7 2 . . . (7„_i) C((7 n+ i, (7!(72 . . . (7 n ) c(a n+2 , 0~ 2 a 3 . . . (7 n+ i) 
c (°Vi+3) • • • C"n+2) • • • c((7 m , 0- m _ n O~ m - n+ i . . . (7 m _i) 

/or a// (7iO~2 . . . o~ m G S + 7 zs injective. 

As it is clearly specified in the definition above, an adaptive code of order n 
associates a variable-length codeword to the symbol being encoded depending 
on the previous n symbols in the input data string. Let us take an example in 
order to better understand this mechanism. 
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Example 2 Let E = {a, b, c} 7 A = {0, 1} be alphabets, and c:Ex S- 2 — > A + 
a function constructed by the following table. One can easily verify that c is 
injective, and according to Definition 1, c is an adaptive code of order two. 



Table 1. An adaptive code of order two. 
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bb 
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cc 
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11 


11 
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00 


11 


11 


01 


00 


00 


11 


01 
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00 


10 


11 
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11 


01 


01 


10 


00 


11 


11 


00 


00 


00 


10 


11 


10 



Let x = abacca G S + be an input data string. Using the definition above, we 
encode x by c(x) = c(a, A)c(b, a)c(a, ab)c(c, ba)c(c, ac)c(a, cc) = 0101111110. 

Let c : S x S- n — > A + be an adaptive code of order n, n > 1. We denote 
by Cco-io-a.-o-h the set {c(a, a x a 2 ■■■<r h ) \ a G E}, for all a x a 2 . . . a h G E- n - 
{A}, and by C Cj \ the set {c(cr, A) | a G E}. We write C aia2 ... ah instead of 
C c ,cT 1 (T2...(T h , an d Ca instead of C Cj a whenever there is no confusion. Let us 
denote by AC(Y>, A,n) the set {c : E x E- n — > A + | c is an adaptive code of 
order n}. 

Theorem 3 Let E and A &e two alphabets and c:Ex E- n — > A + a function, 
n > 1. If C u is prefix code, for all u G E- n 7 £/ien c G AC(E, A, n). 



3 A high performance BWT-based compression scheme 

As we have already pointed out, the algorithm introduced in 1994 by Burrows 
and Wheeler [5] is one of the greatest developments in the field of lossless data 
compression. 

Their algorithm has received special attention not only for its Lempel-Ziv 
like execution speed and compression performance close to the best statistical 
modelling techniques [2], but also for the algorithms it combines. Let us briefly 
describe the three steps in their compression scheme. 

BWT. Let S be a string of length n which is to be compressed. The idea is 
to apply a reversible transformation (called BWT, or the Burrows- Wheeler 
transformation) to the string S in order to form a new string S', which 
contains the same symbols. The purpose of this transformation is to group 
together instances of a symbol Xj occurring in S. More precisely, if a symbol 
Xi is very often followed by Xj in S, then the occurrences of x\ tend to 
be grouped together in S'. Thus, S' has a high locality of reference and is 
easier to compress with simple locally adaptive compression schemes such 
as move-to- front coding (MTF). 
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MTF. The idea of move-to-front coding (MTF) is based on self-organizing 
linear lists. Let L be a linear list containing the symbols which occur in S'. 
If Xi is the current symbol in S' which is to be encoded, then the encoder 
looks up the current position of Xi in L, outputs that position and updates 
L by moving Xi to the front of the list. 

EC. A final entropy coding (EC) step follows the move-to- front encoder. Since 
the output of MTF usually consists of small integers, it can be efficiently 
encoded using a Huffman encoder. 

This is the algorithm which has led to the development of one of the best 
techniques in the field of lossless data compression. Let us present a detailed 
description of BWT and MTF, since our encoder is also based on these algo- 
rithms. 

Algorithm BWT. Let E = {<r , U\, . . . , cr p -i} be an ordered set, and let 
S = S1S2 ■ ■ ■ s n be a string over S, that is, Sj G £ for all i 6 {1,2,..., n}. If M 
is a matrix, M[i,j] denotes the j-th element (from left to right) of the i-th row. 

INPUT: the string S = SiS 2 . . . s n of length n. 

1. Let M be a n x n matrix whose elements are symbols, and whose rows are 
the rotations (cyclic shifts) of S, sorted in lexicographical order. Precisely, 
if Sk 1 Sk 2 ■ ■ ■ s k„ is the i-th rotation of S (in lexicographical order), then 
M{i,j] = s kj for all j G {1,2,..., n}. 

2. Let I be the index of the first row in M which contains the string S 
(there is at least one such row). Exactly, I is the smallest integer such 
that M[I, j] = Sj for all j E {1,2,..., n}. 

3. Let S' = t\t2 ■ ■ ■ t n be the string contained in the last column of the 
matrix M, that is, tj = M[i, n] for alH G {1, 2, . . . , n}. 

OUTPUT: the 2-tuple {S',I). 

Interestingly enough, there exists an efficient algorithm which reconstructs the 
original string S using only S' and I. However, the paper by Burrows and 
Wheeler [5] gives a very detailed description of this algorithm, so we won't 
get into it. Instead, let us explain why the transformed string S' compresses 
much better than S. Consider a symbol Xi which is very often followed by Xj in 
S. Since the rows of M are the sorted rotations of S, and the symbol M[i,n] 
precedes the symbol M[i,l] in S, for all i G {1,2, ... ,n}, some consecutive 
rotations that start with Xj are likely to end in Xj. This is why S' has a high 
locality of reference, and is easier to compress with locally adaptive compres- 
sion schemes such as MTF. 

Let us now introduce some useful notation. Let U = (u\, 112, ■ ■ ■ , Uk) be a 
/c-tuple. We denote by U.i the i-th component of U, that is, U.i = Ui for 
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all i G {1,2, ... ,k}. The O-tuple is denoted by (). The length of a tuple 
U is denoted by Len{U). If V = (v\, v 2 , ■ ■ ■ , Vb), M. = (m 1; m 2 , . . . , m r ,U), 
J\f = (m,n 2 , . ■ .,n s , V), V = (pi, . . . ,Pi~i,Pi,Pi+i, ...,Pt) are tuples, and q is 
an element or a tuple, then we define V < q, V > i, U A V, and M()M by: 

• V <q = (pi, . . .,Pt,g) 

• P > i = (pi, . . . ,pj_i,p m , . . . ,p t ) 

• W A V = (tti, M 2 , • • • , Wfc, «1, «2, • • • , Uft) 

• MOM = (mi + ni,m 2 + 1, . . . , m r + l,n 2 + 1, . . . ,n s + \,U A V) 
where m\, m 2 , ■ ■ ■ , m r , n x ,n 2 , . . . , n s are integers. 

Algorithm MTF. Let S' = t ± t 2 ...t n be the string obtained above. The MTF 
encoder works as follows. 

INPUT: the string S' = t x t 2 ...t n oi length n. 

1. Consider a linear list L which contains the symbols occurring in S' exactly 
once, sorted in lexicographical order. Also, let 71— Q. 

2. For each % — 1, 2, . . . , n execute: 

2.1. Let q be the number of elements preceding tj in L. 

2.2. 7l:=7l<\q. 

2.3. In the list L, move U to the front of the list. 
OUTPUT: the n-tuple 71. 

Example 4 Let S = {a, c, e, h, r, s} be an alphabet, and consider the string 
S = research over E. One can verify that: 
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is the matrix containing the sorted rotations of S, I = 7, S' = ersrcahe, and 
ft =(2,4,5,1,4,4,5,5). 

At this point, it should be clear that applying BWT and MTF as described 



5 



so far, the compression of the string S is reduced to the compression of the 
tuple 1Z. Also, it is trivial to see that if S is sufficiently large (at least several 
kilobytes), then the tuple 1Z will consist mostly of large blocks of zeroes. For 
other details on these algorithms (including implementation details) the reader 
is referred to [5]. 

New algorithms for data compression, based on adaptive codes of order one, 
have been recently presented in [13,14], where we have behaviorally shown 
that for a large class of input strings, our algorithms substantially outperform 
the well-known Lempel-Ziv compression technique [17,18]. The final encoder 
in our compression scheme is based on the algorithms proposed in [13]. Before 
describing it in great detail, let us review the Huffman algorithm, since our 
encoder is based partly on this well-known compression technique. For further 
details on the Huffman algorithm, the reader is referred to [7,10]. 

Algorithm Huffman. As described below, the well-known Huffman algorithm 
takes as input a tuple T = /2, • • • , fn) of frequencies, and returns a tuple 
V = (vi,v 2 , ■ ■ ■ ,v n ) of codewords, such that v; t is the codeword corresponding 
to the symbol with the frequency f i} for all i G {1,2, ... ,n}. 

INPUT: a tuple T = (f±, f 2 , ■ ■ ■ , f n ) of frequencies. 

1. Consider the n-tuples £ = 0, (1)), (/ 2 , 0, (2)), ...,(/„, 0, (ra))) and 
V=(A,A,...,A). 

2. If n = 1 then V.l := 0. 

3. While Len(C) > 1 execute: 

3.1. Let % < j be the smallest integers such that C.i.l, C.jA are the 
smallest elements of the set {C.q.l \ q G {1, 2, . . . , Len(C)}}. 

3.2. F := {£.i.Len(£.i).r \ r G {1, 2, . . . , Len(£.i.Len(£.i))}}. 

3.3. S := {£.j.Len(£.j).r \ r G {1, 2, . . . , Len(C.j.Len(C.j))}}. 

3.4. For each x G F execute V.x := • V.x. 

3.5. For each x G S execute V.x := 1 ■ V.x. 

3.6. U := C.iOC.j; £ := £ > j; £ := £ > i; £ := £ < U. 

OUTPUT: the tuple V. 

Algorithm AE (Adaptive Encoder). The final encoding step in our com- 
pression scheme is based on the algorithms presented in [13], that is, on adap- 
tive codes of order one. As we have already discussed, the input of this final 
encoder is the output of MTF, that is, the tuple 1Z. Let £ = {<Ti, a 2 , ■ ■ ■ , o~ p } 
be an alphabet, and let x be a string over S. Let q be the number of symbols 
occurring in x (thus, q < p). Let us explain the main idea of our scheme. Con- 
sider that it 6 S" is some substring of the input string x. Also, let us denote by 
Follow{u) the set of symbols that follow the substring u in x. For each symbol 
c G Follow(u), let us denote by Freq(c, u) the frequency of the substring uc in 
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x. One can easily remark that Follow(u) cannot contain more than q symbols. 
Moreover, in the most cases, the number of symbols in Follow(u) is signifi- 
cantly smaller than q. Instead of applying the Huffman's algorithm to the q 
symbols occurring in x, we apply it to the set {Freq(c, u) \ c G Follow{u)}, 
since this set has a smaller number of frequencies. If code(c, u) is the codeword 
associated to Freq(c, u), then we encode c by code(c, u) if it is preceded by u. 
Thus, we get smaller codewords. 

This procedure is actually applied to every substring u of length n occurring 
in x. Thus, we associate to each symbol a set of codewords, and encode every 
symbol with one of the codewords in its set, depending on the previous n sym- 
bols. The complete algorithm is given above. Let us now explain what exactly 

INPUT: a string x = x x x 2 ...x t e E + . 

1. Let a : E x E™ -> {0,1}*, b : E x E™ -> {0,1}, 
c : E x E n — > N be three functions; 

2. Let d : {1, 2, . . . ,p n } — > E n be a bijective function; 

3 . for each (a, u) G E x E n do 
3.1. a(cr, u) <— A; 
3.2.6(a,u) <- 0; 
3.3.c(a,u) <- 0; 

4 . for z = n + 1 to t do 

4.1. X{— n . . . Xi—'i) < 1, 

4.2. c(xj, Xi— n . . . < c(xj, . . . Xj_x)-|-1, 

5 . for j = 1 to p n do 
5.1.5 <- (); k <— 1; 

5.2. for i = 1 to p do 

5.2.1. if b(<Ji,d(j)) = 1 then 

5.2.1.1.5 <- 5 < c((7i,d0')); 

5 . 3 . if Len(«S) > 2 then 
5.3.1.V <- Huffman^); 

5 . 4 . for % = 1 to p do 

5.4.1. if b(ai,d(j)) = 1 then 

5.4.1.1. a(a i} d0')) <~ 

5.4. 1.2. A; <- k + 1; 

6. y^();Z^A; 

7 . for % = 1 to p do 

7 . 1 . for j = 1 to p n do 

7. 1 . 1 .if a(aj, d(j)) ^ A then 

7.1.1.1.^^ y <a(a i: d(j)); 

8 . for z = n + 1 to t do 

8 .1 . Z < — ^ ■ a(xi, Xi- n . . . Xi—i); 

OUTPUT: the tuple {xix 2 ...x n , b, y, Z). 

Fig. 1. EAHn. 

the algorithm performs at each step. The first three steps are aimed to initial- 
ize the functions needed. Note that the function d actually allows us to access 
the elements of E™ in a certain order. In the fourth step, . . . 

is switched to 1, since the substring Xj_ n . . . Xj-iXj occurs at least once in x, 
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and the frequency of £j_ n . . . Xi-\Xi is incremented. In the fifth step, for every 
substring d(j) of length n, we apply the Huffman's algorithm to the symbols 
following d(j) in x. In the next two steps, y is a tuple of codewords constructed 
as follows. If c G £ and u G S n , then a(c, u) is appended to y if and only if 
a(c,u) 7^ A, that is, if c G Follow(u) and \Follow(u)\ > 2. Finally, in the last 
step, Z denotes the compression of x n+ \ . . . x t . 

So, the compression of the string x is actually Z. The first three components 
of the output (X1X2 ■ ■ ■ x n ,b, and y) are only needed when decoding Z into x. 

Let us now take an example in order to better understand the description 
above. 

Example 5 Let E = {a, b} be an alphabet, and let us take x = baabbabab G 

S + as an input data string. After applying EAH2 to x, we get the results 
reported in the tables below. 

Table 1 

The function a after EAH2(x). 



s\s 2 


aa 


ab 


ba 


bb 


a 


A 








A 


b 


A 


1 


1 


A 



Table 2 

The function b after EAH2(x). 



s\s 2 


aa 


ab 


ba 


bb 


a 





1 


1 


1 


b 


1 


1 


1 






Table 3 

The function c after EAH2(x). 



s\s 2 


aa 


ab 


ba 


bb 


a 





1 


1 


1 


b 


1 


1 


2 






Let us explain these results by considering the third column of each table. In the 
second table, 6(a, ba) = 1 and 6(b, ba) = 1, since the substrings baa and bab 
both occur at least once in x. In the third table, c(a, ba) — 1 is the frequency 
of baa in x, and c(b,ba) = 2, since bab occurs twice in x. Thus, applying 
the Huffman's algorithm to the set of frequencies {1,2} 7 we encode a (if it is 
preceded by ba) by a(a, ba) = 0. Also, if b is preceded by ba, then we encode 
it by a(b, ba) = 1. 
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Considering that the function d is given by d(l) = aa ; d{2) = ab 7 d(3) = ba ; 
and d(A) = bb, one can verify that the output of EAH2 in this example is the 
A-tuple: 

(ba, 6,(0, 0,1,1), 01101), 

where b is the function given above. Also, one can remark that the function b 
can be encoded using p n+1 bits. In our example, b can be encoded by 2 3 = 8 
bits, since p = 2 and n = 2. 

As one can remark, some new notations have already been used above. Specifi- 
cally, if A is an algorithm and x its input, then we denote by A(x) its output. 
Also, N denotes the set of natural numbers. 

Algorithm Encoder 1. We are now ready to describe our compression scheme 
based on BWT, MTF, and adaptive codes of order one. Consider the alphabet 
£ = {cr , 0"i, ... , p -i} fixed above. 

INPUT: the string S = SiS 2 • • • s n of length n over £. 

1. X :=BWT(5). 

2. y :=MTF(#.l). 

3. Z:=AE(y). 

OUTPUT: the 2-tuple (X.2,Z). 

As we have already pointed out in the beginning of this paper, our compression 
scheme performs much better on proteins than on other type of information. 
For this reason, we will report experimental results obtained only on this type 
of files. Specifically, we have tested our compressor on five well-known biolog- 
ical sequences: E.coli, hi, hs, mj, and sc. The last four files form the Protein 
Corpus [8]. Let us briefly describe each file separately. 

E.coli. One of the most studied biological sequences, Escherichia coli (usually 
abbreviated to E.coli), is a bacterium that lives in warm-blooded organisms. 
This genome is the only biological sequence included in the Large Canter- 
bury Corpus [1]. 

hi. Haemophilus influenzae (abbreviated H. influenzae, or hi) is a bacterium 
that causes ear and respiratory infections in children. It was the first fully 
sequenced genome, made available in 1996. This genome is 1.83 megabases 
in size, and contains approximately 1740 potential genes. When these genes 
are translated into proteins, the resulting file is approximately 500 kilobytes 
in size (representing each amino acid as one byte). 

hs. Homo sapiens (abbreviated H. sapiens, or hs) contains 5733 human genes, 
and the resulting protein file is approximately 3.3 megabytes in size. 

mj. Methanococcus j annas chii (abbreviated M.jannaschii, or mj) lives in very 
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hot undersea vents and has a unique metabolism. It is 1.7 megabases in 
size, contains 1680 genes, and the resulting protein file is approximately 450 
kilobytes in size. 

sc. Saccharomyces cerevisiae (abbreviated S.cerevisiae, or sc) has been stud- 
ied as a model organism for several decades. At 13 megabases in size, it is 
one of the largest sequenced organisms. 

The results reported below have been obtained by comparing Encoderl with 
two of the best compressors available: gzip and bzip2. 

gzip (version 1.3.3). This is one of the most used UNIX utilities, and is 
based on Lempel-Ziv coding (LZ77). 

bzip2 (version 1.0.2). This programme compresses files using the Burrows- 
Wheeler block sorting text compression algorithm, and Huffman coding. 
Compression is generally considerably better than that achieved by the 
LZ77/LZ78-based compressors (including gzip), and approaches the per- 
formance of the PPM family of statistical compressors. 



Table 3. Results of compressing five protein files with gzip and bzip2. 





Size 




bits/ 




bits/ 


Improvement 


File 


(bytes) 


gzip 


symbol 


bzip2 


symbol 


bytes 


% 


E.coli 


4,638,690 


1,299,066 


2.24 


1,251,004 


2.16 


48,062 


3.70 


hi 


509,519 


297,517 


4.67 


275,412 


4.32 


22,105 


7.43 


hs 


3,295,751 


1,897,311 


4.61 


1,753,321 


4.26 


143,990 


7.59 


mj 


448,779 


257,373 


4.59 


239,480 


4.27 


17,893 


6.95 


sc 


2,900,352 


1,682,108 


4.64 


1,558,813 


4.30 


123,295 


7.33 


Total 


11,793,091 


5,433,375 




5,078,030 




355,345 




Table 4. Results of compressing five protein files with bzip2 and Encoc 


erl. 




Size 




bits/ 




bits/ 


Improvement 


File 


(bytes) 


bzip2 


symbol 


Encoderl 


symbol 


bytes 


% 


E.coli 


4,638,690 


1,251,004 


2.16 


1,159,813 


2.00 


91,191 


7.29 


hi 


509,519 


275,412 


4.32 


274,115 


4.30 


1,297 


0.47 


hs 


3,295,751 


1,753,321 


4.26 


1,728,061 


4.19 


25,260 


1.44 


mj 


448,779 


239,480 


4.27 


238,294 


4.25 


1,186 


0.50 


sc 


2,900,352 


1,558,813 


4.30 


1,539,390 


4.25 


19,423 


1.25 


Total 


11,793,091 


5,078,030 




4,939,673 




138,357 





Given the results reported here, one can conclude that our compression scheme 
is one of the most competitive algorithms in the field of biological data com- 
pression. 
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Further work in this field is intended to compare our compression scheme with 
some of the best PPM techniques as they are being developed for (biological) 
data compression. We welcome any suggestions or comments, especially from 
the readers interested in these matters. 
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